252 research outputs found

    Clustering in an Object-Oriented Environment

    Get PDF
    This paper describes the incorporation of seven stand-alone clustering programs into S-PLUS, where they can now be used in a much more flexible way. The original Fortran programs carried out new cluster analysis algorithms introduced in the book of Kaufman and Rousseeuw (1990). These clustering methods were designed to be robust and to accept dissimilarity data as well as objects-by-variables data. Moreover, they each provide a graphical display and a quality index reflecting the strength of the clustering. The powerful graphics of S-PLUS made it possible to improve these graphical representations considerably. The integration of the clustering algorithms was performed according to the object-oriented principle supported by S-PLUS. The new functions have a uniform interface, and are compatible with existing S-PLUS functions. We will describe the basic idea and the use of each clustering method, together with its graphical features. Each function is briefly illustrated with an example.

    Robust Learning from Bites for Data Mining

    Get PDF
    Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets. --Breakdown point,convex risk minimization,data mining,distributed computing,influence function,logistic regression,robustness,scalability

    Finding Outliers in Surface Data and Video

    Full text link
    Surface, image and video data can be considered as functional data with a bivariate domain. To detect outlying surfaces or images, a new method is proposed based on the mean and the variability of the degree of outlyingness at each grid point. A rule is constructed to flag the outliers in the resulting functional outlier map. Heatmaps of their outlyingness indicate the regions which are most deviating from the regular surfaces. The method is applied to fluorescence excitation-emission spectra after fitting a PARAFAC model, to MRI image data which are augmented with their gradients, and to video surveillance data

    Inflation, relative prices and nominal rigidities

    Get PDF
    This paper examines the distribution of Belgian consumer prices and its interaction with aggregate inflation over the period June 1976-September 2000. Given the fat-tailed nature of this distribution, both classical and robust measures of location, scale and skewness are presented. We found a positive short-run impact of the skewness of relative prices on aggregate inflation, irrespective of the average inflation rate. The dispersion of relative prices has also a positive impact on aggregate inflation in the short run and this impact is significantly lower in the sub-sample starting in 1988 than in the pre-1988 sub-sample, suggesting that the prevailing monetary policy regime has a substantial effect on this coefficient. The chronic right skewness of the distribution, revealed by the robust measures, is positively cointegrated with aggregate inflation, suggesting that it is largely dependent on the inflationary process itself and would disappear at zero inflation. These results have three important implications for monetary policy. First, as to the transmission of monetary policy, our results are in line with the predictions of menu cost models and therefore suggest that this type of friction can be an important factor behind the short run non-neutrality of monetary policy. Second, as to the design of robust estimators of core inflation, economic arguments based on menu cost models tend to highlight the importance of the absence of bias. We have proposed an unbiased estimator by taking the time-varying degree of chronic right skewness explicitly into account. Third, as to the optimal rate of inflation, the chronic right skewness found in the data provides no argument against price stability, as it appears as an endogenous response of optimising price setters and would disappear when targeting a zero inflation rate. This conclusion contrasts sharply with the implications of the exogenously assumed downward rigidity of Tobin (1972), which would justify targeting a sufficiently positive inflation rate in order to facilitate the adjustment of relative prices. Our empirical findings contradict the latter type of downward rigidity which implies a negative correlation between skewness and inflation. Therefore, the cross-sectional properties of Belgian inflation data do not provide strong arguments against a price stability-oriented monetary policy, such as the one pursued by the Eurosystem.
    corecore